Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Summary Object-oriented data analysis is a fascinating and evolving field in modern statistical science, with the potential to make significant contributions to biomedical applications. This statistical framework facilitates the development of new methods to analyze complex data objects that capture more information than traditional clinical biomarkers. This paper applies the object-oriented framework to analyze physical activity levels, measured by accelerometers, as response objects in a regression model. Unlike traditional summary metrics, we utilize a recently proposed representation of physical activity data as a distributional object, providing a more nuanced and complete profile of individual energy expenditure across all ranges of monitoring intensity. A novel hybrid Fréchet regression model is proposed and applied to US population accelerometer data from National Health and Nutrition Examination Survey (NHANES) 2011 to 2014. The semi-parametric nature of the model allows for the inclusion of nonlinear effects for critical variables, such as age, which are biologically known to have subtle impacts on physical activity. Simultaneously, the inclusion of linear effects preserves interpretability for other variables, particularly categorical covariates such as ethnicity and sex. The results obtained are valuable from a public health perspective and could lead to new strategies for optimizing physical activity interventions in specific American subpopulations.more » « less
-
Estimation of the mean and covariance parameters for functional data is a critical task, with local linear smoothing being a popular choice. In recent years, many scientific domains are producing multivariate functional data for which $$p$$, the number of curves per subject, is often much larger than the sample size $$n$$. In this setting of high-dimensional functional data, much of developed methodology relies on preliminary estimates of the unknown mean functions and the auto- and cross-covariance functions. This paper investigates the convergence rates of local linear estimators in terms of the maximal error across components and pairs of components for mean and covariance functions, respectively, in both $L^2$ and uniform metrics. The local linear estimators utilize a generic weighting scheme that can adjust for differing numbers of discrete observations $$N_{ij}$$ across curves $$j$$ and subjects $$i$$, where the $$N_{ij}$$ vary with $$n$$. Particular attention is given to the equal weight per observation (OBS) and equal weight per subject (SUBJ) weighting schemes. The theoretical results utilize novel applications of concentration inequalities for functional data and demonstrate that, similar to univariate functional data, the order of the $$N_{ij}$$ relative to $$p$$ and $$n$$ divides high-dimensional functional data into three regimes (sparse, dense, and ultra-dense), with the high-dimensional parametric convergence rate of $$\left\{\log(p)/n\right\}^{1/2}$$ being attainable in the latter two.more » « less
-
Data produced by resting-state functional Magnetic Resonance Imaging are widely used to infer brain functional connectivity networks. Such networks correlate neural signals to connect brain regions, which consist in groups of dependent voxels. Previous work has focused on aggregating data across voxels within predefined regions. However, the presence of within-region correlations has noticeable impacts on inter-regional correlation detection, and thus edge identification. To alleviate them, we propose to leverage techniques from the large-scale correlation screening literature, and derive simple and practical characterizations of the mean number of correlation discoveries that flexibly incorporate intra-regional dependence structures. A connectivity network inference framework is then presented. First, inter-regional correlation distributions are estimated. Then, correlation thresholds that can be tailored to one’s application are constructed for each edge. Finally, the proposed framework is implemented on synthetic and real-world datasets. This novel approach for handling arbitrary intra-regional correlation is shown to limit false positives while improving true positive rates.more » « less
-
A novel non-parametric estimator of the correlation between grouped measurements of a quantity is proposed in the presence of noise. The main motivation is functional brain network construction from fMRI data, where brain regions correspond to groups of spatial units, and correlation between region pairs defines the network. The challenge resides in the fact that both noise and intra-regional correlation lead to inconsistent inter-regional correlation estimation using classical approaches. While some existing methods handle either one of these issues, no nonparametric approaches tackle both simultaneously. To address this problem, a trade-off between two procedures is proposed: correlating regional averages, which is not robust to intra-regional correlation; and averaging pairwise inter-regional correlations, which is not robust to noise. To that end, the data is projected onto a space where Euclidean distance is used as a proxy for sample correlation. Hierarchical clustering is then leveraged to gather together highly correlated variables within each region prior to inter-regional correlation estimation. The convergence of the proposed estimator is analyzed, and the proposed approach is empirically shown to surpass several other popular methods in terms of quality. Illustrations on real-world datasets that further demonstrate its effectiveness are provided.more » « less
-
null (Ed.)Biosensor data have the potential to improve disease control and detection. However, the analysis of these data under free-living conditions is not feasible with current statistical techniques. To address this challenge, we introduce a new functional representation of biosensor data, termed the glucodensity, together with a data analysis framework based on distances between them. The new data analysis procedure is illustrated through an application in diabetes with continuous-time glucose monitoring (CGM) data. In this domain, we show marked improvement with respect to state-of-the-art analysis methods. In particular, our findings demonstrate that (i) the glucodensity possesses an extraordinary clinical sensitivity to capture the typical biomarkers used in the standard clinical practice in diabetes; (ii) previous biomarkers cannot accurately predict glucodensity, so that the latter is a richer source of information and; (iii) the glucodensity is a natural generalization of the time in range metric, this being the gold standard in the handling of CGM data. Furthermore, the new method overcomes many of the drawbacks of time in range metrics and provides more in-depth insight into assessing glucose metabolism.more » « less
-
We develop a quantitative framework for understanding the class of wicked problems that emerge at the intersections of natural, social, and technological complex systems. Wicked problems reflect our incomplete understanding of interdependent global systems and the systemic risk they pose; such problems escape solutions because they are often ill-defined, and thus mis-identified and under-appreciated by communities of problem-solvers. While there are well-documented benefits to tackling boundary-crossing problems from various viewpoints, the integration of diverse approaches can nevertheless contribute confusion around the collective understanding of the core concepts and feasible solutions. We explore this paradox by analyzing the development of both scholarly (social) and topical (cognitive) communities — two facets of knowledge production studies here that contribute towards the evolution of knowledge in and around a problem, termed a knowledge trajectory — associated with three wicked problems: deforestation, invasive species, and wildlife trade. We posit that saturation in the dynamics of social and cognitive diversity growth is an indicator of reduced uncertainty in the evolution of the comprehensive knowledge trajectory emerging around each wicked problem. Informed by comprehensive bibliometric data capturing both social and cognitive dimensions of each problem domain, we thereby develop a framework that assesses the stability of knowledge trajectory dynamics as an indicator of wickedness associated with conceptual and solution uncertainty. As such, our results identify wildlife trade as a wicked problem that may be difficult to address given recent instability in its knowledge trajectory.more » « less
An official website of the United States government
